{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Expected SARSA \n",
    "\n",
    "We will be using **TD control method of Expected SARSA** on Cliff World environment as given below. Expected SARSA takes the average over $Q(S',A')$ weighed by the policy. Here we will using the same policy to explore and same policy to take expectation and hence it is a on-policy SARSA. However, depending on the need, we could have agent trying to learn a ε-greedy policy but use a more exploratory (i.e. with higher ε value) policy as its behavior policy to generate samples.\n",
    "\n",
    "![GridWorld](./images/cliffworld.png \"Cliff World\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Expected SARA  Update equation\n",
    "\n",
    "Expected SARA Q Learning control is carried by sampling step by step and updating Q values at each step. We use ε-greedy policy to explore and generate samples. And we update the Q value by taking expectation over the policy. The Update equation is given below:\n",
    "\n",
    "$$ Q(S,A) ← Q(S,A) + α * [ R + γ * \\sum_{A'} \\pi(A'|S')* Q(S’,A’) – Q(S,A)] $$\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial imports and enviroment setup\n",
    "import gym\n",
    "import gym.envs.toy_text\n",
    "import sys\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Expected SARSA Learning agent class\n",
    "from collections import defaultdict\n",
    "\n",
    "\n",
    "class ExpectedSARSAAgent:\n",
    "    def __init__(self, alpha, epsilon, gamma, get_possible_actions):\n",
    "        self.get_possible_actions = get_possible_actions\n",
    "        self.alpha = alpha\n",
    "        self.epsilon = epsilon\n",
    "        self.gamma = gamma\n",
    "        self._Q = defaultdict(lambda: defaultdict(lambda: 0))\n",
    "\n",
    "    def get_Q(self, state, action):\n",
    "        return self._Q[state][action]\n",
    "\n",
    "    def set_Q(self, state, action, value):\n",
    "        self._Q[state][action] = value\n",
    "\n",
    "    # Expected SARSA Update\n",
    "    def update(self, state, action, reward, next_state, done):\n",
    "        if not done:\n",
    "            best_next_action = self.max_action(next_state)\n",
    "            actions = self.get_possible_actions(next_state)\n",
    "            next_q = 0\n",
    "            for next_action in actions:\n",
    "                if next_action == best_next_action:\n",
    "                    next_q += (1-self.epsilon+self.epsilon/len(actions)) * \\\n",
    "                               self.get_Q(next_state, next_action)\n",
    "                else:\n",
    "                    next_q += (self.epsilon/len(actions)) * \\\n",
    "                               self.get_Q(next_state, next_action)\n",
    "\n",
    "            td_error = reward + self.gamma * next_q - self.get_Q(state, action)\n",
    "        else:\n",
    "            td_error = reward - self.get_Q(state, action)\n",
    "\n",
    "        new_value = self.get_Q(state, action) + self.alpha * td_error\n",
    "        self.set_Q(state, action, new_value)\n",
    "\n",
    "    # get best A for Q(S,A) which maximizes the Q(S,a) for actions in state S\n",
    "    def max_action(self, state):\n",
    "        actions = self.get_possible_actions(state)\n",
    "        best_action = []\n",
    "        best_q_value = float(\"-inf\")\n",
    "\n",
    "        for action in actions:\n",
    "            q_s_a = self.get_Q(state, action)\n",
    "            if q_s_a > best_q_value:\n",
    "                best_action = [action]\n",
    "                best_q_value = q_s_a\n",
    "            elif q_s_a == best_q_value:\n",
    "                best_action.append(action)\n",
    "        return np.random.choice(np.array(best_action))\n",
    "\n",
    "    # choose action as per epsilon-greedy policy for exploration\n",
    "    def get_action(self, state):\n",
    "        actions = self.get_possible_actions(state)\n",
    "\n",
    "        if len(actions) == 0:\n",
    "            return None\n",
    "\n",
    "        if np.random.random() < self.epsilon:\n",
    "            a = np.random.choice(actions)\n",
    "            return a\n",
    "        else:\n",
    "            a = self.max_action(state)\n",
    "            return a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# training algorithm\n",
    "def train_agent(env, agent, episode_cnt=10000, tmax=10000, anneal_eps=True):\n",
    "    episode_rewards = []\n",
    "    for i in range(episode_cnt):\n",
    "        G = 0\n",
    "        state = env.reset()\n",
    "        for t in range(tmax):\n",
    "            action = agent.get_action(state)\n",
    "            next_state, reward, done, _ = env.step(action)\n",
    "            agent.update(state, action, reward, next_state, done)\n",
    "            G += reward\n",
    "            if done:\n",
    "                episode_rewards.append(G)\n",
    "                # to reduce the exploration probability epsilon over the\n",
    "                # training period. YOu can set the flag to False\n",
    "                # and see the impact it has on the episode rewards\n",
    "                if anneal_eps:\n",
    "                    agent.epsilon = agent.epsilon * 0.99\n",
    "                break\n",
    "            state = next_state\n",
    "    return np.array(episode_rewards)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot rewards\n",
    "def plot_rewards(env_name, rewards, label):\n",
    "    plt.title(\"env={}, Mean reward = {:.1f}\".format(env_name,\n",
    "                                                    np.mean(rewards[-20:])))\n",
    "    plt.plot(rewards, label=label)\n",
    "    plt.grid()\n",
    "    plt.legend()\n",
    "    plt.ylim(-300, 0)\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print policy Cliff world\n",
    "def print_policy(env, agent):\n",
    "    nR, nC = env._cliff.shape\n",
    "\n",
    "    actions = '^>v<'\n",
    "\n",
    "    for y in range(nR):\n",
    "        for x in range(nC):\n",
    "            if env._cliff[y, x]:\n",
    "                print(\" C \", end='')\n",
    "            elif (y * nC + x) == env.start_state_index:\n",
    "                print(\" X \", end='')\n",
    "            elif (y * nC + x) == nR * nC - 1:\n",
    "                print(\" T \", end='')\n",
    "            else:\n",
    "                print(\" %s \" %\n",
    "                      actions[agent.max_action(y * nC + x)], end='')\n",
    "        print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    This is a simple implementation of the Gridworld Cliff\n",
      "    reinforcement learning task.\n",
      "\n",
      "    Adapted from Example 6.6 (page 106) from Reinforcement Learning: An Introduction\n",
      "    by Sutton and Barto:\n",
      "    http://incompleteideas.net/book/bookdraft2018jan1.pdf\n",
      "\n",
      "    With inspiration from:\n",
      "    https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py\n",
      "\n",
      "    The board is a 4x12 matrix, with (using Numpy matrix indexing):\n",
      "        [3, 0] as the start at bottom-left\n",
      "        [3, 11] as the goal at bottom-right\n",
      "        [3, 1..10] as the cliff at bottom-center\n",
      "\n",
      "    Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward\n",
      "    and a reset to the start. An episode terminates when the agent reaches the goal.\n",
      "    \n"
     ]
    }
   ],
   "source": [
    "# create cliff world environment\n",
    "env = gym.envs.toy_text.CliffWalkingEnv()\n",
    "print(env.__doc__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create an Expected SARSA agent\n",
    "agent = ExpectedSARSAAgent(alpha=1.0, epsilon=0.2, gamma=0.99,\n",
    "                           get_possible_actions=lambda s : range(env.nA))\n",
    "\n",
    "# train agent and get rewards for episodes\n",
    "rewards = train_agent(env, agent, episode_cnt = 5000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAkF0lEQVR4nO3deXxU9b3/8deHyKYsFoogBBUrLoAQIFKtgHFB0FIqWit6vWppRVra/mpvewvXey220qtir631IRat9dKrUDeKxUrRYsS1CLIIIqtQAlSQzUQBWT6/P84JTJLJJJNJMsk57+fjcR7MfL/nnPl8JsNnzvmeZczdERGReGmS7QBERKT+qfiLiMSQir+ISAyp+IuIxJCKv4hIDKn4i4jEkIq/pGRmN5nZawnPS8zs1PBxSzP7s5ntMbOnwrY7zewjM/tnFmLdYGaXVNJXYGZF9R1T1JnZKWbmZnZMtmOR9Kj4C2Y21Mzmm1mxmW03s1fMbESyed29lbuvD59+DegItHf3q82sK/BvQA9375TkdVaZ2dcTnp8fFo7ybSXZLCbhF4Wb2bPl2vuE7YVZCi22zOxCM3s53NDYkKR/g5ntDT87JWY2N8W6zMzuNrMd4XSPmVmdJtAAqfjHnJl9DXgKmAbkEhTz24GvVGPxk4HV7n4w4fkOd99WyfzzgQsSng8G3k/S9kbCOquTQ118UWwHvmRm7RPabgRW18Frpc3McrLwmtncuv8EeBT4cYp5vhJunLRy90tTzDcGuALoA/QGhgO31FagjYWKf5aZWWczeybc4v7AzL6f0DfRzJ40s2nhVvkKM8sP+8ab2dPl1vVrM7s/jdc24H+An7v7I+6+x90Pu/sr7n5zJcu4mZ1mZncQfElcE25p3QK8CHQOnz+WZPH5BMW91CDg7iRt88PXGhHmvNvMCs3srIQ4NpjZT8xsGfBJ+cIUDkk9Zma7zOw94Jzqvi+hz4A/AaPC9eUAXwceL/c6Z5rZi2a2M8mezZfNbLGZfWxmm8xsYkJf6XDJjWb2j3Co7LbKgglzmWJmfzGzT4ALK/vsmFmLcCv48+Hz/zSzg2bWJnx+p5n9Ko0Yv2lm/wDmmVmOmd0bxrse+HKa72uNuPsCd/8DsL7Kmat2I/BLdy9y983AL4GbamG9jYu7a8rSRPDlu4igiDYDTiX4cA8N+ycC+4DLgRzgv4G3wr6TgU+BNuHzHGArcG74/EFgdyXTsnCeMwEHuqWI8SbgtYTnDpyWEN//JfQVAEUp1nUScBhoF+a+DWgJbEpo203wZXA6wdbeEKAp8O/AWqBZuK4NwBKgK9Ayoe2S8PFdwKvhersCy1PFVi7OAqAI+BLw97DtcuCvwLeAwrDtuDD2bwDHAP2Aj4CeCes5O8yrN/AhcEXYd0r4Xj4cvgd9gP3AWZXE9BiwBzg/XN+xpP7szAeuCh/PBdYBlyX0jUwjxmlhri2BsQR7a13D9/blcJ5jKol7NpV/DmfX4P/MJcCGJO0bwti3h/n2SbGOPcAXE57nA8XZrgf1PWnLP7vOATq4+8/c/TMPxtIfJtzaDL3m7n9x90PAHwiKBO6+EXiHYPcV4CLgU3d/K+z/jrsfX8nUO1ymdEhja51mGXL3fwD/INi67wOscfe9wOsJbS2AvwPXAM+7+4vufgC4l6D4fClhlfe7+6ZwHeV9HZjk7jvdfRNQ7T2ihHjfANqZ2RnADQRFMNFwgkL0e3c/6O7vAM8QHAvB3Qvd/V0P9qaWAdMpO8QFcIe773X3pcDS8D2ozCx3f93dDxMU7FSfnVeAC8I9ot5h/heYWQuCz92racQ40d0/Cd/nrwO/Ct/3nQQbJKnew+EpPofDUy2bpn8h+LI6meAL6a9mdnwl87Yi+AIotQdoFbdxfxX/7DqZYJhkd+kE/AfBuHupxLNmPgVaJAxxPAFcGz6+Lnyejh3hvyemuVwmSod+BhMWIOC1hLa/u/t+oDOwsXShsOBtArokrGtTitfpXK5/Y2UzVuEPwHeBC4GZ5fpOBr5Y7u/3L0AnADP7YniQcruZ7SHYav58uXWU//u2ShFLYj5VfXZeIdiq7we8SzAkdwFwLrDW3T9KI8bE162t97VSZvYfdvTA7UPVWSb8Utzr7p+6+38T7FkMqmT2EqBNwvM2QImHuwFxoeKfXZuAD8ptDbV298urufxTQIGZ5QIjSSj+ZvZQwn+g8tOKcLZVYQxX1WZSVSgt/oM4WvxfTWibH7ZtIShwwJHjE12BzQnrSvWfdWs4f6mTahjvH4DvAH9x90/L9W0CXin392vl7t8O+58AngO6untb4CEgk63LxHyr+uy8AZxB8Ll4xd3fI3gPvkzwxVCqOjEmvm5a76uZvZDic/hC0iTdf+FHD9yOTbX+FJzK3+sVlN3D6hO2xYqKf3YtAD4OD1y2DA+m9TKzah2cdPftQCHwe4JCsDKhb2zCf6DyU89wHgd+CPyXmX3DzNqYWRMzG2hmU2s928B8oC/BVujrYdu7QDeCrevS4v8k8GUzu9jMmhKcQrqfoKhVx5PABDP7XPjl+L3EzvAA6mNVrcTdPwhjTXYwdjZwupn9q5k1DadzEg5MtwZ2uvs+MxtAsHdWW1J+dsIvqkXAOI4W+zcIzmpJLP7pxvgk8H0zyzWzzwHjU83s7pel+BxeVt1kw89lC4LjPxYe1G4W9p1kwSnCzcL2HxPsvbxeyeqmAT80sy5m1pngs/VYdWOJChX/LArH8b8C5AEfEBwsfARom8ZqniA4CJbukE9pDE8TjK+PJtja/hC4E5hVk/VV4/VWExzo3eruu8O2wwTFrA1hcXf3VcD1wG8I3pevEJzK91k1X+oOgiGJDwgOAP6hXH9XKi8O5WN+zd23JGkvBi4lGGffQjCEczfQPJzlO8DPzKyY4MDsk9WMvToxVeez8wpBsVyQ8Lw1R79gaxLjwwQHvpcSHHN6NvXstWYwsBf4C8Hexl6CvysEOU0BdhHsGQ4jOLi9A8DMBplZScK6fgv8mWCjYznwfNgWKxazYS4Rwi3GpUDv8GCySOyo+IuIxFDWhn3MbJgFF8WsNbOU44YiIlK7srLlb8HVkqsJLuApAt4Grg3PSBARkTqWrS3/AQTnGq8PD+DNAL6apVhERGInWzdq6kLZC0WKgC+Wn8nMxhDchImWLVv279q1a/lZquXw4cM0aRKvE5uUczzELee45QuZ57x69eqP3L1D+fZsFf9kF19UGH9y96nAVID8/HxfuHBhjV6ssLCQgoKCGi3bWCnneIhbznHLFzLP2cySXoWdra/QIspeJZhLcJ60iIjUg2wV/7eB7mbWLTznehTBJeYiIlIPsjLs4+4Hzey7BFcK5gCPunvs7q0hIpItWftlHnf/C8Gl2iIiUs/iddhcREQAFX8RkVhS8RcRiSEVfxGRGMraAd9seGDeGu6duzqjdXRt15JNO8v+ZOy4C79w5PG0Nzdy/LFNGdGnc0avk6mNGz/j7f3vZzWG+qacoy9u+UKQ8/mDDtM0p3a31RvNLZ0zucL3R4/OpXPuydw/b20tR3XUMU2Mg4ePvpc5TSyj3+vLlLsTs9+jVs4xELd8Ich5xc+G0aJpTo2WN7NF7p5fvj0WW/5Prz4Aq+uu8P/1B4M5o1NrVm79mMt+Hfws7eo7LyOnSfY+pLoMPh7ilnPc8oUg55oW/lRiUfxr4sIzOvDyqu0AXH/uSbQ7rjn7Dhxi6vz1AHz/4u4U7fyUdsc144xOrQE468Q23Ht1Hzq1aZHVwi8iUpXYF/+RfbswoFs7Jjz7Lj+4pDvnntqen85awZTr+3Pmf80BYOBpHRjWqxPAkeL/wyGnJ13f1/rn1k/gIiIZiH3x79m5DdcOOIlrB5x0pO2vtw4G4NIeHZn73ofZCk1EpM7E/lTPbw7slu0QRETqXeS3/D/8eF+lfRvu+nI9RiIi0nBEfst/0cZdSdtPaN28ymUbx0mwIiLpi/yWfzIPXNeXi848odrzx+y0YhGJgchv+SfTtmVTjm0Wy+89EREgpsVfRCTuIl/8H3plXY2XbSR3vhARSVvki/+yoj0Zr0ND/iISNZEv/rVBOwAiEjUq/iIiMaTiXw0a9hGRqFHxT0kDPiISTSr+1RC3H48QkehT8RcRiSEVfxGRGFLxFxGJIRX/FHSFr4hEVZ0VfzObaGabzWxJOF2e0DfBzNaa2SozG1pXMYiISHJ1fWvL+9z93sQGM+sBjAJ6Ap2Bl8zsdHc/VMexiIhIKBvDPl8FZrj7fnf/AFgLDMhCHNWmEz1FJGrquvh/18yWmdmjZva5sK0LsClhnqKwrcHRkL+IRFVGwz5m9hLQKUnXbcAU4OcENfTnwC+B0STfkE5aZ81sDDAGoGPHjhQWFmYS7hFLly7j0OacKufbsSP4/d/ly9/lmG0ry/TVVix1paSkpMHHWNuUc/TFLV+ou5wzKv7ufkl15jOzh4HZ4dMioGtCdy6wpZL1TwWmAuTn53tBQUH6Qc55vkJTnz69GdS9Q5WLTtvwNmzfRq9eZ1PQo2OZ9dUolnpUWFjY4GOsbco5+uKWL9RdznV5ts+JCU9HAsvDx88Bo8ysuZl1A7oDC+oqDhERqaguz/a5x8zyCIZ0NgC3ALj7CjN7EngPOAiMa+hn+ujWPiISNXVW/N39X1P0TQIm1dVr1xbXVV4iElG6wldEJIZU/EVEYkjFvxo05i8iUaPiLyISQyr+Kehwr4hElYq/iEgMqfiLiMSQin81mO7rKSIRo+Kfgq7xEpGoUvGvDm34i0jEqPhXh/YARCRiVPxFRGJIxb86NOwjIhGj4p+CRntEJKpU/EVEYkjFX0QkhlT8q0FD/iISNSr+IiIxpOKfgn7GUUSiSsVfRCSGVPxFRGJIxb8aTL/jKCIRo+IvIhJDKv4iIjGk4i8iEkMq/iIiMaTiXw063CsiUaPin4Ku8RKRqMqo+JvZ1Wa2wswOm1l+ub4JZrbWzFaZ2dCE9v5m9m7Yd7/pPEoRkXqX6Zb/cuBKYH5io5n1AEYBPYFhwINmlhN2TwHGAN3DaViGMdQZfS2JSFRlVPzdfaW7r0rS9VVghrvvd/cPgLXAADM7EWjj7m96cOOcacAVmcRQlzTsIyJRdUwdrbcL8FbC86Kw7UD4uHx7UmY2hmAvgY4dO1JYWFgrwS1duoxDm3OqnG/Xrr0ALFu2jMNbys5fW7HUlZKSkgYfY21TztEXt3yh7nKusvib2UtApyRdt7n7rMoWS9LmKdqTcvepwFSA/Px8LygoSB1sMnOer9DUp09vBnXvUOWij6z9O+z4qOz84fpqFEs9KiwsbPAx1jblHH1xyxfqLucqi7+7X1KD9RYBXROe5wJbwvbcJO0iIlKP6upUz+eAUWbW3My6ERzYXeDuW4FiMzs3PMvnBqCyvYesc/2Eu4hEVKaneo40syLgPOB5M/srgLuvAJ4E3gPmAOPc/VC42LeBRwgOAq8DXsgkhvpgusxLRCImowO+7j4TmFlJ3yRgUpL2hUCvTF5XREQyoyt8RURiSMVfRCSGVPxT0EVeIhJVKv7VoNs8iEjUqPin0PVzxwLQukVdXQgtIpIdqmopTBzRkwvPPIHeuccfafvbv13AJ/sPZi8oEZFaoOKfQstmOQzrVfbOFl/o0CpL0YiI1B4N+4iIxFCsiv/5p7UH4LQTtPUuIvEWq2GfK/K68Pi3zs12GCIiWRerLX8REQmo+IuIxFCsir9+K15EJBCr4v/G2o+yHYKISIMQq+L/j52fZjsEEZEGIVbF/7Du1CYiAsSs+BeccUK2QxARaRBiVfzzuh6f7RBERBqEWBX/nCY620dEBGJW/Ht2bpPtEEREGoRYFf/jj22W7RBERBqEWBV/EREJqPiLiMSQir+ISAxFvvgPOKUdAO2P03i/iEipyBf/Y3J0eqeISHmRL/66o4OISEUZFX8zu9rMVpjZYTPLT2g/xcz2mtmScHoooa+/mb1rZmvN7H7TfZZFROpdplv+y4ErgflJ+ta5e144jU1onwKMAbqH07AMYxARkTRlVPzdfaW7r6ru/GZ2ItDG3d90dwemAVdkEoOIiKSvLn/AvZuZLQY+Bv7T3V8FugBFCfMUhW1JmdkYgr0EOnbsSGFhYdpB7N69F4DPDnxWo+Ubq5KSkljlC8o5DuKWL9RdzlUWfzN7CeiUpOs2d59VyWJbgZPcfYeZ9Qf+ZGY9gWTj+5UeknX3qcBUgPz8fC8oKKgq3AoeWv0m7NxJs6bNqMnyjVVhYWGs8gXlHAdxyxfqLucqi7+7X5LuSt19P7A/fLzIzNYBpxNs6ecmzJoLbEl3/TWhk35ERI6qk1M9zayDmeWEj08lOLC73t23AsVmdm54ls8NQGV7D7UTS9KdDRGReMv0VM+RZlYEnAc8b2Z/DbsGA8vMbCnwNDDW3XeGfd8GHgHWAuuAFzKJQURE0pfRAV93nwnMTNL+DPBMJcssBHpl8rrpcA34iIhUEJsrfDX4IyJyVOSLv4iIVBSb4q/BHxGRoyJf/HXnIBGRiiJf/EVEpKLIF3/d0llEpKLoF/9sByAi0gBFvviX0tC/iMhRsSn+2gMQETkq8sVfW/wiIhVFvvhri19EpKLIF38REalIxV9EJIZU/EVEYijyxf+4ZjkAXDuga5YjERFpOCJf/Jsfk0On44wfXXpGtkMREWkwIl/8AY4xMN3hTUTkiFgUfxERKUvFX0QkhlT8RURiKPLFXz/gLiJSUeSLv4iIVKTiLyISQyr+IiIxpOIvIhJDKv4iIjEU+eKvH3AXEakoo+JvZpPN7H0zW2ZmM83s+IS+CWa21sxWmdnQhPb+ZvZu2He/1cN9F3RrBxGRsjLd8n8R6OXuvYHVwAQAM+sBjAJ6AsOAB80sJ1xmCjAG6B5OwzKMQURE0pRR8Xf3ue5+MHz6FpAbPv4qMMPd97v7B8BaYICZnQi0cfc33d2BacAVmcQgIiLpO6YW1zUa+GP4uAvBl0GporDtQPi4fHtSZjaGYC+Bjh07UlhYmHZQH320j8OHDtVo2caspKREOcdA3HKOW75QdzlXWfzN7CWgU5Ku29x9VjjPbcBB4PHSxZLM7ynak3L3qcBUgPz8fC8oKKgq3Aoe/8dCtu/dTk2WbcwKCwuVcwzELee45Qt1l3OVxd/dL0nVb2Y3AsOBi8OhHAi26BN/OisX2BK25yZpFxGRepTp2T7DgJ8AI9z904Su54BRZtbczLoRHNhd4O5bgWIzOzc8y+cGYFYmMYiISPoyHfN/AGgOvBieTvmWu4919xVm9iTwHsFw0Dh3PxQu823gMaAl8EI4iYhIPcqo+Lv7aSn6JgGTkrQvBHpl8roiIpIZXeErIhJDkS/+kPwUIxGROItF8RcRkbJU/EVEYkjFX0QkhmJQ/HXEV0SkvBgUf9AdnUVEyopF8RcRkbJU/EVEYkjFX0QkhiJf/HWFr4hIRZEv/iIiUpGKv4hIDKn4i4jEkIq/iEgMRb7463iviEhFkS/+oFs6i4iUF4viLyIiZan4i4jEkIq/iEgMRb74uy7xFRGpIPLFH3TAV0SkvFgUfxERKUvFX0QkhlT8RURiSMVfRCSGIl/8da6PiEhFGRV/M5tsZu+b2TIzm2lmx4ftp5jZXjNbEk4PJSzT38zeNbO1Zna/WT38vLpO9xERKSPTLf8XgV7u3htYDUxI6Fvn7nnhNDahfQowBugeTsMyjEFERNKUUfF397nufjB8+haQm2p+MzsRaOPub3pw9dU04IpMYhARkfTV5pj/aOCFhOfdzGyxmb1iZoPCti5AUcI8RWGbiIjUo2OqmsHMXgI6Jem6zd1nhfPcBhwEHg/7tgInufsOM+sP/MnMepJ89L3SY7JmNoZgiIiOHTtSWFhYVbgV7Nixj8OHDtVo2caspKREOcdA3HKOW75QdzlXWfzd/ZJU/WZ2IzAcuDgcysHd9wP7w8eLzGwdcDrBln7i0FAusCXFa08FpgLk5+d7QUFBVeFW8Pv1C/jkwx3UZNnGrLCwUDnHQNxyjlu+UHc5Z3q2zzDgJ8AId/80ob2DmeWEj08lOLC73t23AsVmdm54ls8NwKxMYhARkfRVueVfhQeA5sCL4Rmbb4Vn9gwGfmZmB4FDwFh33xku823gMaAlwTGCF8qvVERE6lZGxd/dT6uk/RngmUr6FgK9MnnddOgiLxGRiiJ/ha+IiFSk4i8iEkMq/iIiMRT54q+fcRQRqSjyxR90XzcRkfJiUfxFRKQsFX8RkRhS8RcRiaFMr/BtFOrh52JEGo0DBw5QVFTEvn37sh1K2tq2bcvKlSuzHUa9qm7OLVq0IDc3l6ZNm1ZrvbEo/iJyVFFREa1bt+aUU06hPn5IrzYVFxfTunXrbIdRr6qTs7uzY8cOioqK6NatW7XWq2EfkZjZt28f7du3b3SFXypnZrRv3z6tvbnIF3+d5i9SkQp/9KT7N4188RcRkYpU/EWk3uXk5JCXl3dkuuuuu+r8NXfv3s2DDz6Y9nITJ07k3nvvrdC+atUqCgoKyMvL46yzzmLMmDFl+u+77z5atGjBnj17jrQVFhbStm1b+vbty5lnnsmPfvSjI30ffvghw4cPp0+fPvTo0YPLL7+8zPpmzpyJmfH++++nnUMyKv4iUu9atmzJkiVLjkzjx4+v89esafGvzPe//31uvfVWlixZwsqVK/ne975Xpn/69Omcc845zJw5s0z7oEGDWLx4MYsXL2b27Nm8/vrrANx+++0MGTKEpUuX8t5771X4Qpw+fToDBw5kxowZtRJ/5M/2cd3RX6RSd/x5Be9t+bhW19mjcxt++pWeaS+3Z88eBgwYwHPPPccZZ5zBtddey0UXXcTNN99Mq1atuOWWW/jb3/5G+/btmTFjBh06dGDdunWMGzeO7du3c+yxx/Lwww9z5pln8uGHHzJ27FjWr18PwJQpU7j//vtZt24deXl5DBkyhMmTJzN58mSefPJJ9u/fz8iRI7njjjsAmDRpEtOmTaNr16506NCB/v37V4h369at5OYe/VXas88++8jjdevWUVJSwuTJk/nFL37BTTfdVGH5li1bkpeXx+bNm4+s79JLLz3S37t37yOPS0pKeP3113n55ZcZMWIEEydOTPv9LS8WW/46tCXSsOzdu7fMsM8f//hH2rZtywMPPMBNN93EjBkz2LVrFzfffDMAn3zyCf369ePVV1/lggsuOFKkx4wZw29+8xsWLVrEvffey3e+8x0g2Cq/4IILWLp0Ke+88w49e/bkrrvu4gtf+AJLlixh8uTJzJ07lzVr1rBgwQKWLFnCokWLmD9/PosWLWLGjBksXryYZ599lrfffjtpDrfeeisXXXQRl112Gffddx+7d+8+0jd9+nSuvfZaBg0axKpVq9i2bVuF5Xft2sWaNWsYPHgwAOPGjeOb3/wmF154IZMmTWLLlqM/b/6nP/2JYcOGcfrpp9OuXTveeeedjP8Gkd/yF5HK1WQLvTaUDvuUN2TIEJ566inGjRvH0qVLj7Q3adKEa665hr1793L99ddz5ZVXUlJSwhtvvMHVV199ZL79+/cDMG/ePKZNmwYExxfatm3Lrl27yrzW3LlzmTt3Ln379gWCres1a9ZQXFzMyJEjOfbYYwEYMWJE0hy+8Y1vMHToUObMmcOsWbP47W9/y9KlS2nevDkzZsxg5syZNGnShCuvvPJITgCvvvoqvXv3ZtWqVYwfP55OnToBMHToUNavX8+cOXN44YUX6Nu3L8uXL6dFixZMnz6dH/zgBwCMGjWK6dOn069fv3Tf9jJU/EWkwTh8+DArV66kZcuW7Ny5s8ywSiIz4/Dhwxx//PFJv0Sqw92ZMGECt9xyS5n2X/3qV9U+bbJz586MHj2a0aNH06tXL5YvX07Tpk1Zs2YNQ4YMAeCzzz7j1FNPPVL8Bw0axOzZs1m9ejUDBw5k5MiR5OXlAdCuXTuuu+46rrvuOoYPH878+fPp378/8+bNY/ny5ZgZhw4dwsy45557MjplN/LDPjrPX6TxuO+++zjrrLOYPn06o0eP5sCBA0DwpfD0008D8MQTTzBw4EDatGlDt27deOqpp4CgmJfuLVx88cVMmTIFgEOHDvHxxx/TunVriouLj7zW0KFDefTRRykpKQFg8+bNbNu2jcGDBzNz5kz27t1LcXExf/7zn5PGOmfOnCPx/fOf/2THjh106dKF6dOnM3HiRDZs2MCGDRvYsmULmzdvZuPGjWWWP/3005kwYQJ33303EOytfPrpp0BwVe+6des46aSTmDVrFjfccAMbN25kw4YNbNq0iW7duvHaa69l9F5HvviD7u0j0tCUH/MfP348q1ev5pFHHuGXv/wlgwYNYvDgwdx5550AHHfccaxYsYLBgwczb948br/9dgAef/xxfve739GnTx969uzJrFmzAPj1r3/Nyy+/zNlnn03//v1ZsWIF7du35/zzz6dXr178+Mc/5tJLL+W6667jvPPO4+yzz+ZrX/saxcXF9OvXj2uuuYa8vDyuuuoqBg0alDSHuXPn0qtXL/r06cPQoUOZPHkynTp1YsaMGYwcObLMvCNHjkx6ls7YsWOZP38+H3zwAYsWLSI/P5/evXtz3nnn8a1vfYtzzjmHp59+usL6rrrqKp544omM/gbWWH7pKj8/3xcuXJj2ctc9/BYf7dzF3J9cVgdRNVyFhYUUFBRkO4x6pZyrZ+XKlZx11ll1E1AdadWqFSUlJbq3TxWS/W3NbJG755efNxZb/iIiUlbki38j2bERkRRKx+Wl9kS++ItIRY1luFeqL92/qYq/SMy0aNGCHTt26AsgQkrv59+iRYtqL6Pz/EViJjc3l6KiIrZv357tUNK2b9++tApcFFQ359Jf8qquyBd/3dtHpKymTZtW+9eeGprCwsIjV+TGRV3lnNGwj5n93MyWmdkSM5trZp0T+iaY2VozW2VmQxPa+5vZu2Hf/VYPvyqh0/xFRMrKdMx/srv3dvc8YDZwO4CZ9QBGAT2BYcCDZpYTLjMFGAN0D6dhGcYgIiJpyqj4u3vivWCPgyNjLF8FZrj7fnf/AFgLDDCzE4E27v6mB0ebpgFXZBKDiIikL+MxfzObBNwA7AEuDJu7AG8lzFYUth0IH5dvr2zdYwj2EgBKzGxVDcP8vI3noxou21h9HpRzDMQt57jlC5nnfHKyxiqLv5m9BHRK0nWbu89y99uA28xsAvBd4KckH2b3FO1JuftUYGpVMVbFzBYmu7w5ypRzPMQt57jlC3WXc5XF390vqea6ngCeJyj+RUDXhL5cYEvYnpukXURE6lGmZ/t0T3g6Aij9ZeHngFFm1tzMuhEc2F3g7luBYjM7NzzL5wZgViYxiIhI+jId87/LzM4ADgMbgbEA7r7CzJ4E3gMOAuPc/VC4zLeBx4CWwAvhVNcyHjpqhJRzPMQt57jlC3WUc6O5pbOIiNQe3dtHRCSGVPxFRGIo0sXfzIaFt5dYa2bjsx1PJszsUTPbZmbLE9ramdmLZrYm/PdzCX0N5vYaNWVmXc3sZTNbaWYrzOz/he2RzdvMWpjZAjNbGuZ8R9ge2ZwBzCzHzBab2ezwedTz3RDGusTMFoZt9Zuzu0dyAnKAdcCpQDNgKdAj23FlkM9goB+wPKHtHmB8+Hg8cHf4uEeYb3OgW/g+5IR9C4DzCK65eAG4LNu5pcj5RKBf+Lg1sDrMLbJ5h/G1Ch83Bf4OnBvlnMNYf0hwuvjsmHy2NwCfL9dWrzlHect/ALDW3de7+2fADILbTjRK7j4f2Fmu+avA/4aP/5ejt8qIxO013H2ru78TPi4GVhJcER7ZvD1Q+rNVTcPJiXDOZpYLfBl4JKE5svmmUK85R7n4dwE2JTxPeSuJRqqjB9dOEP57QtheWe5dSOP2Gg2JmZ0C9CXYEo503uEQyBJgG/Ciu0c9518B/05wynipKOcLwRf6XDNbFN7GBuo55yjfzz+tW0lETK3cXqOhMLNWwDPAD9z94xTDmpHI24NrYvLM7Hhgppn1SjF7o87ZzIYD29x9kZkVVGeRJG2NJt8E57v7FjM7AXjRzN5PMW+d5BzlLf/KbjERJR+Gu36E/24L2yNzew0za0pQ+B9392fD5sjnDeDuu4FCgtueRzXn84ERZraBYGj2IjP7P6KbLwDuviX8dxswk2CYul5zjnLxfxvobmbdzKwZwe8LPJflmGrbc8CN4eMbOXqrjEjcXiOM8XfASnf/n4SuyOZtZh3CLX7MrCVwCcFtUyKZs7tPcPdcdz+F4P/oPHe/nojmC2Bmx5lZ69LHwKXAcuo752wf9a7LCbic4AyRdQR3Ic16TBnkMh3YytHbYn8TaA/8DVgT/tsuYf7bwrxXkXAGAJAfftDWAQ8QXuXdECdgIMFu7DJgSThdHuW8gd7A4jDn5cDtYXtkc06It4CjZ/tENl+CMxCXhtOK0tpU3znr9g4iIjEU5WEfERGphIq/iEgMqfiLiMSQir+ISAyp+IuIxJCKv4hIDKn4i4jE0P8HX7cIpQ/4nA8AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot per episode reward graph\n",
    "plot_rewards(\"Cliff World\", rewards, 'Expected SARSA')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " >  >  >  >  >  >  >  >  >  >  >  v \n",
      " >  >  >  >  >  >  >  >  >  >  >  v \n",
      " ^  ^  ^  ^  ^  ^  ^  >  ^  ^  >  v \n",
      " X  C  C  C  C  C  C  C  C  C  C  T \n"
     ]
    }
   ],
   "source": [
    "# print CLiff Wolrd policy\n",
    "print_policy(env, agent)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Expected SARSA for \"Taxi\" environment "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAkZUlEQVR4nO3de5wU5Z3v8c9vBoSRa0AFYVDGBJWLMMBIRBmcFRQ0BgXjop4sRmJQY2LiZnMC4bwUs7LBmKzRddUQNcazAqsmI0YXgh4ywWuICigXuYxgHEAxiMIoIJff+aNqhp6hey7d09MzXd/361Uvqp+q6vo93cyvqp966ilzd0REJFpyMh2AiIg0PyV/EZEIUvIXEYkgJX8RkQhS8hcRiSAlfxGRCFLyl1bHzBaZ2dWZjiPbmNkjZnZ7puOQ5qHkL2llZmvMrDKcDpnZvpjXP07mPd39Qnf/bQP3X2ZmbmZDapU/FZaXJBODNIwFbjezrWb2Sfh9DGzAdleH38+1zRFnFCn5S1q5+0B37+juHYEXgO9UvXb3f2umMDYAU6pemFl34Czgw2baf0Jm1iZD+81tpl1dDkwFioFuwCvA/61rAzP7AjADWJP26CJMyT8CzKyXmf3OzD40s81mdlPMsllm9riZPWpme8Iz9aJw2XQze7LWe91tZvc0QUxfNLOlZrbTzP5uZo+ZWdeYZR+Z2bCY+P9edZYenj025ozwMWByTMK7EigFPo+JJyesb3kY0+Nm1i1m+RNm9n549ros9uw1bC75TzN7NvwM/2JmX0xQ777hGe03zexvwNKwfKqZrTOzXWb2RzM7OSy/zcz+I5xva2afmtnPwtd54S+pLzQwxvvN7H/M7FPgH8xsqJm9Ecb830D7RnymDVUAvOju77j7IeC/gAH1bPNT4B7g72mIR0JK/lnOzHKAPwCrgN7AGOD7ZjYuZrUJwAKgK/A0cG9YPh+4yMw6h++VC/wjMC98fZ+ZfZxgerO+0Aj+yHsB/YE+wCwAdy8HfgQ8ZmbHAr8BHnH3siQ/hm3AWuCC8PUU4NFa69wEXAqcG8a0C/jPmOWLgH7ACcAbBAeUWFcCtwFfADYBs+uJ6VyCeo8zs0uBHwOTgOMJfiHND9f7M1ASzp8JvB9uCzASWO/uuxoY41VhXJ2A5cBTBGfh3YAngMsSBWtmo+r4rj82s1EJNl0AfMnMTjWztsDVwOI69jMCKAIeSLSONBF315TFE/Bl4G+1ymYAvwnnZwHPxywbAOyNef0iMCWcPx8oTyGWMuDaBMsuBVbUKnsaeAt4E2jXkPdJtE/g6wQJ9TRgQ7isAigJ59cBY2K2OxE4ALSJ855dAQe6hK8fAR6MWX4R8HaCePqG254SU7YI+GbM6xzgM+BkIA/YB3QHphMcJCqAjgQHm3sS7CdejI/GLB9NcFC0mLKXgdub+P/fMcDdYSwHgc1AQYJ1c4HXgJGN/Z41NX7SmX/2OxnoFXuWRpBAesSs837M/GdA+5i26HkEZ7UQnDnOa4qgzOwEM1sQXgjcTdAccFyt1X4NDAL+w933p7jL3wPnAd8lfpvzyUBpzGe0DjgE9DCzXDObEzYJ7Qa2hNvExlv7M+xYTzzv1dr33TH7/ojgl1Fvd99LkBDPJUjYfyZI0ueEZX+G4FdZA2KM3WcvYKuHWTb0bj0xJ+NWgl8sfQialW4Dloa/6Gr7NvCmu7+ShjikFiX/7PcesNndu8ZMndz9ogZu/wRQYmb5wERikr+ZPWBHeu7Unuq7WPdTgrPBwe7emeDM3GLeuyPwS+AhYFZs+3sy3P0zgjPsG4if/N8DLqz1ObV3960EB71LgLFAF4Kzd2LjTSakWvu+rta+89z95XD5nwkOXEOBv4avxwEjgGXhOg2JMXaf24HeZha7/KREwZpZcR3fdaWZFSfYdAjw3+5e4e4H3f0RgqaxeO3+Y4CJ4XWL94GzgV+Y2b1x1pUUKflnv+XAbjP7UXiBMNfMBpnZmQ3Z2N0/JPj5/RuCg8i6mGXX+5GeO7Wn+rrzdQIqgY/NrDfww1rL7wZed/drgWdJ0AYccwG1bwOq82PgXHffEmfZA8DsmAutx5vZJTGx7gd2AscCTd1L6QFgRtUFWjPrYmaXxyz/M8F1irXu/jlHmrI2h99PMjG+QtAMc5OZtTGzSQQHk7jc/YU6vuuO7v5Cgk3/ClxuZj0suKj+T0BbgusitX2D4DpIYTi9RvBLYWY9dZEkKPlnOQ96WHyV4I9pM0EPigcJzg4bah7BGWWTNPmEbgOGAZ8QJPffVy0Ik+544Pqw6J+BYWb2v+K8Tx+C5oqt9e3Q3be5+4sJFt9NcI1hiZntAV4luF4CwcXhqn2sDZc1GXcvBe4AFoRNNquBC2NWeZmg7b/qLH8twXWAZTHrNCrG8CAyiSDh7gImE/MdNKE7CDobrAQ+Bm4GLnP3j6H6hr0fhzF97O7vV00EvbF2u/snaYgr8qxmk59I62Jm/wf40N1/lelYRFoTJX8RkQjKWLOPmY03s/VmtsnMpmcqDhGRKMrImX94s9AGgn7jFQQXha5097XNHoyISARl6sx/BLDJg1u+Pye4C/CSerYREZEmkpFBpQiGGYi94aSCIz0rqpnZNGAaQF5e3vA+ffoktbPDhw+TkxMc57bsPpzUe7QGvTvmYECbnKDOWA5/2xPUt2/nmsf52p/DyZ1yMIN9B6F9E/2v+Oygs/cA7DngdGxrHJfXuG7xBw5DjkFuAzeL/Z6jImp1jlp9IfU6b9iw4e/ufnzt8kwl/3h/zke1P7n7XGAuQFFRkb/22mtJ7aysrIySkhIA+k5/Nqn3aA3Wz76QtrnBf5KysjKGjjiHIT9ZQo7B+p9+pca6VZ/DO/92EQ7k5qRyv1Jij/3lXWaWrubKEX346aTBadlHldjvOSqiVueo1RdSr7OZxb1zO1OH0AqC/tlV8gnGGYm8ju0adjz+5qgCACYN611dlmu1EngD8nlOjqUt8QMcuaSUvn2ISONlKvn/FehnZgVmdgxwBcENNpGXKEV+97wvxS0fcGLn6vmcWkm86lhgtQ8KwIiCbow8pXtSMSYjTggikkEZafZx94Nm9h3gjwQj+T3s7npwQx1+cMFp/OCC05JqtoqXdx+/bmTqQTWA7iIRaZky1eaPu/8P8D+Z2n9L0v/Ezqzbvjt4EZOpu3U4ho8+/Tz+Rg3QIk62w3afFhGLAHDgwAEqKirYt29fpkNptC5durBu3br6V8wiDa1z+/btyc/Pp23btg1634wlfzli/MCeR5J/jJenn8f3Fqygd9ejR7+NvT3j3yaewbzlR1/TqWruaQlNLi0hBglUVFTQqVMn+vbtG7dJsCXbs2cPnTp1ynQYzaohdXZ3du7cSUVFBQUFBQ16XyX/FiDR31/7trn86p+K6t3+qi+fxFVfTjgaL5bB8241+7Q8+/bta5WJXxIzM7p3786HHzb8sdTR6jDbQqXrT7Al/GmPKAiG4R83sGeGI5FYSvzZp7Hfqc78s1j1/4UM/p2f3rMzW+Z8pf4VRaRZ6cw/w64bfUqN1+nI0zrHk5YmNzeXwsLC6mnOnDlp3+fHH3/Mfffd1+jtZs2axc9//vOjytevX09JSQmFhYX079+fadOm1Vh+11130b59ez755MjjCMrKyujSpQtDhw7l9NNP51/+5V+ql33wwQdcfPHFDBkyhAEDBnDRRTUftldaWoqZ8fbbbze6DvEo+bcAnmC+LuGNvOTU8VMvk239InXJy8tj5cqV1dP06ekf2DfZ5J/ITTfdxM0338zKlStZt24d3/3ud2ssnz9/PmeeeSalpaU1youLi1mxYgUrVqzgmWee4aWXXgLglltu4fzzz2fVqlWsXbv2qAPi/PnzGTVqFAsWLGiS+JX8W4BkBlb97ph+TBl5cp0Xej08lKh5V1qDTz75hNNOO43169cDcOWVV/LrX/8agI4dO/KDH/yA4uJixowZU31hs7y8nPHjxzN8+HCKi4urz4o/+OADJk6cyJAhQxgyZAgvv/wy06dPp7y8nMLCQn74w+CpoXfeeSdnnnkmgwcP5tZbb62OZfbs2Zx22mmMHTu2Op7atm/fTn5+fvXrM844o3q+vLycyspKbr/9dubPnx93+7y8PAoLC9m6dWvc9xs8+MhwKJWVlbz00ks89NBDTZb81ebfwlw+vA8Pv7S53vU6t2/LTy4Z1KD31C8ASeS2P6xh7bajuxmnYkCvztz61bof4bx3714KCwurX8+YMYPJkydz77338o1vfIPvfe977Nq1i29961sAfPrppwwbNoxZs2Zx1113cdttt3Hvvfcybdo0HnjgAfr168df/vIXvv3tb7N06VJuuukmzj33XEpLSzl06BCVlZXMmTOH1atXs3LlSgCWLFnCxo0bWb58Oe7OhAkTWLZsGR06dGDBggWsWLGCgwcPMmzYMIYPH35UHW6++WbOO+88zj77bC644AKuueYaunbtCgRn6VdeeSXFxcWsX7+eHTt2cMIJJ9TYfteuXWzcuJHRo0cDcOONN1Z/BmPHjuWaa66hV69eADz11FOMHz+eU089lW7duvHGG28wbNiwZL6eakr+zeS8009g6ds76lznW8UFzLiwf53Jf91PxnOogT8V9JA2aamqmn1qO//883niiSe48cYbWbVqVXV5Tk4OkydPZu/evXz9619n0qRJVFZW8vLLL3P55Ueedb9//34Ali5dyqOPPgoE1xe6dOnCrl27auxryZIlLFmyhKFDhwLB2fXGjRvZs2cPEydO5Nhjg/trJkyYELcO11xzDePGjWPx4sUsXLiQX/3qV6xatYp27dqxYMECSktLycnJYdKkSdV1AnjhhRcYPHgw69evZ/r06fTsGfSEGzduHO+88w6LFy9m0aJFDB06lNWrV9O+fXvmz5/P97//fQCuuOIK5s+fr+TfWjz8jTPrHZohr23uUePzHLXOMbkN3mdV7lezjyRS3xl6czt8+DDr1q0jLy+Pjz76qEYzSCwz4/Dhw3Tt2jXuQaQh3J0ZM2Zw3XXX1Sj/5S9/2eBuk7169WLq1KlMnTqVQYMGsXr1atq2bcvGjRs5//zzAfj888855ZRTqpN/cXExzzzzDBs2bGDUqFFMnDix+ldQt27duOqqq7jqqqu4+OKLWbZsGcOHD2fp0qWsXr0aM+PQoUOYGT/72c9S6rKrNv8M6JJX8/ZrT9OtUO3bBF/vt0u+mJb3F2lqd911F/3792f+/PlMnTqVAwcOAMFB4cknnwRg3rx5jBo1is6dO1NQUMATTzwBBMm86tfCmDFjuP/++wE4dOgQu3fvplOnTuzZs6d6X+PGjePhhx+msrISgK1bt7Jjxw5Gjx5NaWkpe/fuZc+ePfzhD3+IG+vixYur43v//ffZuXMnvXv3Zv78+cyaNYstW7awZcsWtm3bxtatW3n33Zp34Z966qnMmDGDO+64Awh+rXz22WdAcFdveXk5J510EgsXLmTKlCm8++67bNmyhffee4+CggJefPHFlD5rJf8MeO7m0Uy/8PTq19ecXcC4gT245pyG3ZbdUG1yc9gy5yt857x+Tfq+IqmqavOvmqZPn86GDRt48MEH+cUvfkFxcTGjR4/m9ttvB6BDhw6sWbOG0aNHs3TpUm655RYAHnvsMR566CGGDBnCwIEDWbhwIQB33303f/rTnzjjjDMYPnw4a9asoXv37pxzzjkMGjSIH/7wh1xwwQVcddVVjBw5kjPOOIOvfe1r7Nmzh2HDhjF58mQKCwu57LLLKC4ujluHJUuWMGjQIIYMGcK4ceO488476dmzJwsWLGDixIk11p04cWLcC7XXX389y5YtY/Pmzbz++usUFRUxePBgRo4cybXXXsuZZ57Jk08+edT7XXbZZcybNy+l7yAjz/BNRmt/mMuWOV+p3vfymWP4/RtbmbPoba4bfQozLupfY92q9VK5OUoPvYiGZOq8bt06+vfvX/+KLUjHjh2prKzU2D71iPfdmtnr7n7UODE688+QVnLMFZEspeTfjI7r2C7TIYi0SlXt8tJ0lPyb1ZHTffXAkUxqLc290nCN/U6V/JvBbRNqdqfTTVeSSe3bt2fnzp06AGSRqvH827dv3+Bt1M+/GVx9dl9A7fzSMuTn51NRUdGosd9bin379jUqwWWDhta56kleDaXknwFm9R8Irj9XffMlPdq2bdvgpz21NGVlZdV35EZFuuqs5N+M4ub7OC1AGv9eRNJNbf5NqFP7hh1La+R7NQWJSAYo+Teh3l3z6lx+24SBHNfxGDrntVVvHxHJKDX7NKOvDunFV4f0ynQYIiI6888U9fwRkUxS8m9CSQ2vquYfEckAJf9M0y8AEckAJf8M0QVfEcmktCV/M5tlZlvNbGU4XRSzbIaZbTKz9WY2Ll0xNLfG5HO1+YtIJqW7t89d7v7z2AIzGwBcAQwEegHPm9mp7n4ozbG0TPoFICIZkIlmn0uABe6+3903A5uAERmIo0l94+y+asoRkVYj3cn/O2b2ppk9bGZfCMt6A+/FrFMRlrVqxzbiweo1qPlHRDIgpWYfM3se6Bln0UzgfuBfCdLbvwK/AKYSv6Ej/rA3ZtOAaQA9evSgrKwsqTgrKyuT3rah/va3v1FZGb/lKt6+N7/zebDde+9RVvZBk8fTHHVuaVTn7Be1+kL66pxS8nf3sQ1Zz8x+DTwTvqwA+sQszge2JXj/ucBcCJ7hm+zzWWs853Rxep7he9JJJ7F534ewe/dRy+LFvY5y2PA2J/XpQ0lJ0z9PVc+zjYao1Tlq9YX01TmdvX1OjHk5EVgdzj8NXGFm7cysAOgHLE9XHM0pqTZ/XScQkQxIZ2+fn5lZIUGTzhbgOgB3X2NmjwNrgYPAjdnS00dP6BKR1iJtyd/d/6mOZbOB2enad6uiC74ikgG6wzdD1C1URDJJyT9DdIeviGSSkn8T0gVfEWktlPybkPK4iLQWSv6ZpuYfEckAJf8M0QVfEckkJf8M0QVfEcmkSCX/pW83/Rg6NegxjiLSSkQq+S9Zk77krxN5EWlNIpX8000n8SLSWij5NxEjyYu4+skgIhmg5C8iEkFK/k0k6RN4tRWJSAYo+YuIRJCSfxMxdBIvIq2Hkn+m6YKviGRApJJ/OodUcMDi7GDcwB7p26mISJLS+RjHFqe5h1TYMucr9a+ktiIRyYBInfmnw5cLumU6BBGRRotM8r/9mbW8sPHvTf6+o089HtAFXxFpXSLT7PPgi5vT+v5Jtyjpgq+IZEBkzvybg8boF5HWQsk/TdrkNPBIoAOGiGSAkn8j9e1+bL3rTBrWm8XfH90M0YiIJEfJv5EmFPZOuMzC0/jJRX340gkdmyskEZFGU/JvpA7H5Na7jq7hikhLp+TfSMNO/gK/uHxIpsMQEUmJkn8SLhueXz1fdWE3V119RKQVSSn5m9nlZrbGzA6bWVGtZTPMbJOZrTezcTHlw83srXDZPRZvQJxWZMrIvlxzTl9uKPlidc+d5h5GQkSksVI9818NTAKWxRaa2QDgCmAgMB64z8yqGsvvB6YB/cJpfIoxZFTeMbnc+tWBdGjXRr02RaTVSCn5u/s6d18fZ9ElwAJ33+/um4FNwAgzOxHo7O6vuLsDjwKXphJDS+QNuOR7bHjhuMMxkbnJWkRakHRlnt7AqzGvK8KyA+F87fK4zGwawa8EevToQVlZWVLBVFZW0lR3U61YsYJPtxzp8RMb0yef7AVg1cpVfP5e3b2Ceh92/vHUtvS3CsrKtjZJbLEqKyuT/rxaK9U5+0WtvpC+Oteb/M3seaBnnEUz3X1hos3ilHkd5XG5+1xgLkBRUZGXlJTUHWwCwQf3aVLb1jZs6FCK+naDxc8CEBtTZbdtfGfeCr52wSiO79Su3vca2yQRxVdWVkayn1drpTpnv6jVF9JX53qTv7snk6MqgD4xr/OBbWF5fpzyVqOuBp2LB/fi4sG9mi0WEZFkpaur59PAFWbWzswKCC7sLnf37cAeMzsr7OUzBUj060FERNIk1a6eE82sAhgJPGtmfwRw9zXA48BaYDFwo7sfCje7AXiQ4CJwObAolRhERKTxUrrg6+6lQGmCZbOB2XHKXwMGpbJfERFJje7wbST15ReRbKBO5kmaNLQ3o/odl+kwRESSouTfSFW9ff59cmEmwxARSYmafUREIkjJv5HU5i8i2UDJX0QkgpT8RUQiSMlfRCSClPwb6eTuHTIdgohIypT8G6kho3WKiLR0Sv4iIhGk5C8iEkFK/o3w4JSi+lcSEWkFlPwboeS04zMdgohIk1Dyb4Q2ufq4RCQ7KJuJiESQkr+ISAQp+YuIRJCSv4hIBCn5i4hEkJK/iEgEKfmLiESQkr+ISAQp+YuIRJCSv4hIBCn5i4hEkJK/iEgEpZT8zexyM1tjZofNrCimvK+Z7TWzleH0QMyy4Wb2lpltMrN7zMxSiUFERBov1TP/1cAkYFmcZeXuXhhO18eU3w9MA/qF0/gUYxARkUZKKfm7+zp3X9/Q9c3sRKCzu7/i7g48ClyaSgwiItJ4bdL43gVmtgLYDfwfd38B6A1UxKxTEZbFZWbTCH4l0KNHD8rKypIKpLKyEki9dSnZ/WdCZWVlq4q3KajO2S9q9YX01bne5G9mzwM94yya6e4LE2y2HTjJ3Xea2XDgKTMbSPwM7In27e5zgbkARUVFXlJSUl+4cQUf3KdJbRsr2f1nQllZWauKtymoztkvavWF9NW53uTv7mMb+6buvh/YH86/bmblwKkEZ/r5MavmA9sa+/4iIpKatHT1NLPjzSw3nD+F4MLuO+6+HdhjZmeFvXymAIl+PYiISJqk2tVzoplVACOBZ83sj+Gi0cCbZrYKeBK43t0/CpfdADwIbALKgUWpxCAiIo2X0gVfdy8FSuOU/w74XYJtXgMGpbJfERFJje7wFRGJICV/EZEIUvIXEYkgJX8RkQhS8hcRiSAlfxGRCFLyFxGJICV/EZEIUvIXEYkgJX8RkQhS8q9Dh2NyMx2CiEhaKPnXYck/n8tvp47IdBgiIk0unU/yajF2fHY4qe16d82jd9e8Jo5GRCTzInHm/3T5gUyHICLSokQi+YuISE1K/iIiEaTkLyISQZG44Gspbn/WKd3Yvfdgk8QiItISRCL5e4rbL5g2skniEBFpKdTsIyISQUr+IiIRFInkn2qbv4hItolE8hcRkZqU/EVEIkjJX0QkgpT8RUQiSMlfRCSCUkr+Znanmb1tZm+aWamZdY1ZNsPMNpnZejMbF1M+3MzeCpfdY2bqjCMi0sxSPfN/Dhjk7oOBDcAMADMbAFwBDATGA/eZWdVjse4HpgH9wml8ijGIiEgjpZT83X2Ju1cNevMqkB/OXwIscPf97r4Z2ASMMLMTgc7u/oq7O/AocGkqMYiISOM15dg+U4H/Dud7ExwMqlSEZQfC+drlcZnZNIJfCfTo0YOysrKkAjtw4ADJ3OqV7P5agsrKylYdfzJU5+wXtfpC+upcb/I3s+eBnnEWzXT3heE6M4GDwGNVm8VZ3+soj8vd5wJzAYqKirykpKS+cON66K0/huE1TrL7awnKyspadfzJUJ2zX9TqC+mrc73J393H1rXczK4GLgbGhE05EJzR94lZLR/YFpbnxykXEZFmlGpvn/HAj4AJ7v5ZzKKngSvMrJ2ZFRBc2F3u7tuBPWZ2VtjLZwqwMJUYRESk8VJt878XaAc8F/bYfNXdr3f3NWb2OLCWoL3lRnc/FG5zA/AIkAcsCicREWlGKSV/d/9SHctmA7PjlL8GDEplvyIikhrd4SsiEkFK/iIiEaTkLyISQUr+IiIRpOQvIhJBSv4iIhEUieSvQaNFRGqKRPIXEZGaIpH8PeHQcSIi0RSJ5C8iIjVFIvmrzV9EpKZIJH8REalJyV9EJIKU/EVEIkjJP4F7rhya6RBERNJGyT+B804/IdMhiIikjZK/iEgEKfmLiESQkn8C7dvooxGR7KUMl0CbXH00IpK9lOFERCJIyV9EJILaZDqAluaeK4fiGgZURLKckn8tA3t15ovHd8x0GCIiaaVmn1r6du+Q6RBERNJOyb+W3ByN/ywi2U/JX0QkglJK/mZ2p5m9bWZvmlmpmXUNy/ua2V4zWxlOD8RsM9zM3jKzTWZ2j5ketSIi0txSPfN/Dhjk7oOBDcCMmGXl7l4YTtfHlN8PTAP6hdP4FGOol44uIiI1pZT83X2Jux8MX74K5Ne1vpmdCHR291c86E/5KHBpKjE0hDpuiojU1JRt/lOBRTGvC8xshZn92cyKw7LeQEXMOhVhmYiINKN6+/mb2fNAzziLZrr7wnCdmcBB4LFw2XbgJHffaWbDgafMbCDxW2ASnpib2TSCJiJ69OhBWVlZfeHGdfDAgQS7Plqy+2hpKisrs6YuDaU6Z7+o1RfSV+d6k7+7j61ruZldDVwMjAmbcnD3/cD+cP51MysHTiU4049tGsoHttWx77nAXICioiIvKSmpL9y4frP6jwTHpvolu4+WpqysLGvq0lCqc/aLWn0hfXVOtbfPeOBHwAR3/yym/Hgzyw3nTyG4sPuOu28H9pjZWWEvnynAwlRiEBGRxkt1eId7gXbAc2GPzVfDnj2jgZ+Y2UHgEHC9u38UbnMD8AiQR3CNYFHtNxURkfRKKfm7+5cSlP8O+F2CZa8Bg1LZr4iIpEZ3+IqIRJCSv4hIBCn5i4hEUCSSv4Z3EBGpKRLJX8M7iIjUFInkLyIiNUUi+avZR0SkpkgkfxERqUnJX0QkgiKR/D/ar0u+IiKxIpH83/zwUKZDEBFpUSKR/OszfmC8xxWIiGQvJX/guE7HZDoEEZFmpeQPmDqDikjEKPkDrnuARSRilPxFRCJIyV9EJIKU/FGbv4hEj5I/YMr9IhIxSv4iIhGk5I9G/RSR6FHyB0ztPiISMUr+IiIRpOQvIhJBSv4iIhGk5I+6eopI9Cj5i4hEkJI/usNXRKInpeRvZv9qZm+a2UozW2JmvWKWzTCzTWa23szGxZQPN7O3wmX3WAvoZ5n5CEREmleqZ/53uvtgdy8EngFuATCzAcAVwEBgPHCfmeWG29wPTAP6hdP4FGMQEZFGSin5u/vumJcdoHpg/EuABe6+3903A5uAEWZ2ItDZ3V9xdwceBS5NJYamcG1xQaZDEBFpVm1SfQMzmw1MAT4B/iEs7g28GrNaRVh2IJyvXZ7ovacR/EoAqDSz9UmGeRzw90QLe90Rs887Eq3V6tRZ5yylOme/qNUXUq/zyfEK603+ZvY8EO8J5zPdfaG7zwRmmtkM4DvArcQfLsfrKI/L3ecCc+uLsT5m9pq7F6X6Pq2J6hwNUatz1OoL6atzvcnf3cc28L3mAc8SJP8KoE/MsnxgW1ieH6dcRESaUaq9ffrFvJwAvB3OPw1cYWbtzKyA4MLucnffDuwxs7PCXj5TgIWpxCAiIo2Xapv/HDM7DTgMvAtcD+Dua8zscWAtcBC40d0PhdvcADwC5AGLwindUm46aoVU52iIWp2jVl9IU50t6HQjIiJRojt8RUQiSMlfRCSCsjr5m9n4cHiJTWY2PdPxpMLMHjazHWa2Oqasm5k9Z2Ybw3+/ELOs1QyvkYiZ9TGzP5nZOjNbY2bfC8uztt5m1t7MlpvZqrDOt4XlWVtnADPLNbMVZvZM+Drb67sljHWlmb0WljVvnd09KycgFygHTgGOAVYBAzIdVwr1GQ0MA1bHlP0MmB7OTwfuCOcHhPVtBxSEn0NuuGw5MJLgnotFwIWZrlsddT4RGBbOdwI2hHXL2nqH8XUM59sCfwHOyuY6h7H+M0F38Wci8n97C3BcrbJmrXM2n/mPADa5+zvu/jmwgGDYiVbJ3ZcBH9UqvgT4bTj/W44MldGqhtdIxN23u/sb4fweYB3BHeFZW28PVIYv24aTk8V1NrN84CvAgzHFWVvfOjRrnbM5+fcG3ot5XedQEq1UDw/unSD894SwPFHde9OI4TVaEjPrCwwlOBPO6nqHTSArgR3Ac+6e7XX+JfC/CbqMV8nm+kJwQF9iZq+Hw9hAM9c55bF9WrBGDSWRZZpkeI2Wwsw6Ar8Dvu/uu+to1syKentwT0yhmXUFSs1sUB2rt+o6m9nFwA53f93MShqySZyyVlPfGOe4+zYzOwF4zszermPdtNQ5m8/8Ew0xkU0+CH/6Ef67IyzPmuE1zKwtQeJ/zN1/HxZnfb0B3P1joIxg2PNsrfM5wAQz20LQNHuemf0X2VtfANx9W/jvDqCUoJm6Weuczcn/r0A/Mysws2MIni/wdIZjampPA1eH81dzZKiMrBheI4zxIWCdu/97zKKsrbeZHR+e8WNmecBYgmFTsrLO7j7D3fPdvS/B3+hSd/86WVpfADPrYGadquaBC4DVNHedM33VO50TcBFBD5FyglFIMx5TCnWZD2znyLDY3wS6A/8P2Bj+2y1m/ZlhvdcT0wMAKAr/o5UD9xLe5d0SJ2AUwc/YN4GV4XRRNtcbGAysCOu8GrglLM/aOsfEW8KR3j5ZW1+CHoirwmlNVW5q7jpreAcRkQjK5mYfERFJQMlfRCSClPxFRCJIyV9EJIKU/EVEIkjJX0QkgpT8RUQi6P8Dk4KW2OvWPWwAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# create taxi environment\n",
    "env = gym.make(\"Taxi-v3\")\n",
    "\n",
    "# create am Expected SARSA agent\n",
    "agent = ExpectedSARSAAgent(alpha=0.25, epsilon=0.2, gamma=0.99, \n",
    "                           get_possible_actions=lambda s : range(env.nA))\n",
    "\n",
    "#train agent and get rewards for episodes\n",
    "rewards = train_agent(env, agent, episode_cnt = 5000)\n",
    "\n",
    "#plot reward graph\n",
    "plot_rewards(\"Taxi\", rewards, 'Expected SARSA')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion\n",
    "\n",
    "We see that Expected SARSA Agent learns the optimal policy by about 1000 episodes of training in case of CLiff World. If you see the update equation, you will notice that Q(S',A') is averaged over. So even with exploration, it is able to lean a policy better than normal SARA. The policy learnt by agent here is to go through the middle row which is kind of a combination of both SARSA and Q-learning. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
